Assembly Language
©
Copyright Brian Brown, 1988-2000. All rights reserved.
| Notes | Home Page |
THE HISTORY OF ASSEMBLY LANGUAGE
PROGRAMMING, Part 1
Early computer systems were literally programmed by hand. Front panel switches were used to enter instructions and data. These switches represented the address, data and control lines of the computer system.To enter data into memory, the address switches were toggled to the correct address, the data switches were toggled next, and finally the WRite switch was toggled. This wrote the binary value on the front panel data switches to the address specified. Once all the data and instruction were entered, the run switch was toggled to run the program.
The programmer also needed to know the instruction set of the processor. Each instruction needed to be manually converted into bit patterns by the programmer so the front panel switches could be set correctly. This led to errors in translation as the programmer could easily misread 8 as the value B. It became obvious that such methods were slow and error prone.
With the advent of better hardware which could address larger memory, and the increase in memory size (due to better production techniques and lower cost), programs were written to perform some of this manual entry. Small monitor programs became popular, which allowed entry of instructions and data via hex keypads or terminals. Additional devices such as paper tape and punched cards became popular as storage methods for programs.
Programs were still hand-coded, in that the conversion from mnemonics to instructions was still performed manually. To increase programmer productivity, the idea of writing a program to interpret another was a major breakthrough. This would be run by the computer, and translate the actual mnemonics into instructions. The benefits of such a program would be
As programmers were writing the source code in mnemonics anyway, it seemed the logical next step. The source file was fed as input into the program, which translated the mnemonics into instructions, then wrote the output to the desired place (paper-tape etc). This sequence is now accepted as common place.
The only advances have been the increasing use of high level languages to increase programmer productivity.
Assembly language programming is writing machine instructions in mnemonic form, using an assembler to convert these mnemonics into actual processor instructions and associated data.
The disadvantages of assembly language programming are
THE PROGRAM TRANSLATION SEQUENCE
developing a software program to
accomplish a particular task, the implementor chooses an appropriate language,
develops the algorithm (a sequence of steps, which when carried out in the order
prescribed, achieve the desired result), implements this algorithm in the chosen
language (coding), then tests and debugs the final result.
here is also a probable maintenance phase also associated. The chosen language will undoubtably need to be converted into the appropriate binary bit-patterns which make sense to the target processor (the processor on which the software will be run). This process of conversion is called translation.
The following diagram illustrates the translation sequence necessary to generate machine code from specific languages.
ASSEMBLY LANGUAGE PROGRAMMING
Asemblers are programs which generate
machine code instructions from a source code program written in assembly
language. The features provided by an assembler are,
In writing assembly language programs for micro-computers, it is essential that a standardized format be followed. Most manufacturers provide assemblers, which are programs used to generate machine code instructions for the actual processor to execute.
The assembler converts the written assembly language source program into a format which run on the processor. Each machine code instruction (the binary or hex value) is replaced by a mnemonic. A mnemonic is an abbreviation which represents the actual instruction.
+----------+---------+-----------------+ | Binary | Hex | Mnemonic | +----------+---------+-----------------+ | 01001111 | 4F | CLRA | Clears the A accumulator +----------+---------+-----------------+ | 00110110 | 36 | PSHA | Saves A acc on stack +----------+---------+-----------------+ | 01001101 | 4D | TSTA | Tests A acc for 0 +----------+---------+-----------------+
Mnemonics are used because they
Assemblers also accept certain characters as representing number bases and addressing modes.
$ prefix or h suffix for hexadecimal $24 or 24h D for decimal numbers 24D 67 B for binary numbers 0101111B O or Q for octal numbers 377O 232Q # for immediate addressing LDAA #$34 ,X for indexed addressing LDAA 01,X
Assembly language statements are written one per line. A machine code program thus consists of a sequence of assembly language statements, where each statement contains a mnemonic. Each line of an assembly language program is split into four fields, as shown below
LABEL OPCODE OPERAND COMMENTS
The label field is optional. A label is an identifier (or text string symbol). Labels are used extensively in programs to reduce reliance upon programmers remembering where data or code is located. A label can be used to refer to<
The maximum length of a label differs between assemblers. Some accept up to 32 characters long, others only four characters. A label, when declared, is suffixed by a colon, and begins with a valid character (A..Z). Consider the following example.
START: LDAA #24H
Here, the label START is equal to the address of the instruction LDAA #24H. The label is used in the program as a reference, eg,
JMP START
This would result in the processor jumping to the location (address) associated with the label START, thus executing the instruction LDAA #24H immediately after the JMP instruction. When a label is referenced later on in the program, it is done so without the colon suffix.
An advantage of using labels is that inserting or re-arranging code statements do not necessitate re-working actual machine instructions. A simple re-assembly is all that is required. In hand-coding, such changes can take hours to perform.
Each instruction consists of an opcode and possible one or more operands. In the above instruction
JMP START
the opcode is JMP and the operand is the address of the label START.
The opcode field contains a mnemonic. Opcode stands for operation code, ie, a machine code instruction. The opcode may also require additional information (operands). This additional information is separated from the opcode by using a space (or tab stop).
The operand field consists of additional information or data that the opcode requires. In certain types of addressing modes, the operand is used to specify
Examples of operands are
TAB ; operand specified by opcode LDAA 0100H ; two byte operand LDAA START ; label operand LDAA #0FH ; immediate operand
The comment field is optional, and is used by the programmer to explain how the coded program works. Comments are preceded by a semi-colon. The assembler, when generating instructions from the source file, ignores all comments. Consider the following examples,
; H means hexadecimal values ORG 0100H ;This program starts at address 0100 hex STATUS: DFB 23H ;This byte is identified as STATUS, and is ;initialized to a value of 23 hex CODE: LDAA STATUS ;The label called CODE is identified as a ;machine code instruction which loads the ;A accumulator with the contents of the ;memory location associated with the label ;STATUS, ie, the value 23 JMP CODE ;Jump to the address associated with CODE
Note that the programmer does not need to worry about bit patterns, hex values, and the addresses of STATUS or CODE. The assembler, when fed the above program, will generate the correct code. The code output from the assembler will be,
Memory location Byte value 0100 23 0101 B6 0102 01 0103 00 0104 7E 0105 01 0106 01 Location 0100 holds the value associated with the label STATUS Locations 0101 to 0103 perform the LDAA STATUS instruction Locations 0104 to 0106 perform the JMP CODE instruction
The statement ORG 0100H in the above program is not a machine code instruction. It is an instruction to the assembler, which instructs the assembler to generate the code to run at the designated origin address. Instructions to assemblers are called pseudo-ops. These are used for
The assembler does not generate any machine code instructions for pseudo-ops or comments. Assemblers scan the source program, generating machine instructions. Sometimes, the assembler reaches a reference to a variable which has not yet been defined. This is referred to as a forward reference problem. The assembler can tackle this problem in a number of ways. It is resolved in a two pass assembler as follows,
On the first pass, the assembler simply reads the source file, counting up the number of locations that each instruction will take, and builds a symbol table in memory which lists all the defined variables cross-referenced to their associated memory address. On the second pass, the assembler substitutes opcodes for the mnemonics, and variable names are replaced by the memory locations obtained from the symbol table.
OPERATION OF A TWO-PASS ASSEMBLER
Consider the following source
code program for a hypothetical computer. The program computes the so-called
Fibonacci numbers, printing all such numbers up to that specified by
LIMIT.
Line Label Operation Operand 1 Operand 2 1 COPY ZERO OLDER 2 COPY ONE OLD 3 READ LIMIT 4 WRITE OLD 5 FRONT: LOAD OLDER 6 ADD OLD 7 STORE NEW 8 SUB LIMIT 9 BRPOS FINAL 10 WRITE NEW 11 COPY OLD OLDER 12 COPY NEW OLD 13 BR FRONT 14 FINAL: WRITE LIMIT 15 STOP 16 ZERO: CONST 0 17 ONE CONST 1 18 OLDER SPACE 19 OLD SPACE 20 NEW SPACE 21 LIMIT SPACE
The instruction set of the computer is as follows,
Operation Code Number of Symbolic Machine Length Operands Action ADD 02 2 1 ACC <- ACC + OPD1 BR 00 2 1 Branch to OPD1 BRPOS 01 2 1 Branch to OPD1 if ACC> 0 COPY 13 3 2 OPD2 <- OPD1 LOAD 03 2 1 ACC <- OPD1 READ 12 2 1 OPD1 <- input stream STOP 11 1 0 Halt execution STORE 07 2 1 OPD1 <- ACC SUB 06 2 1 ACC <- (ACC - OPD1) WRITE 08 2 1 output stream <- OPD1
The functions that the assembler will perform in translating the program are,
IMPLEMENTATION
The assembler uses two counters to keep track of
the machine language program. One counter, called the location counter,
keeps track of the physical address location being used, and will initially be
set to zero for this program (or the value designated by the ORG directive).
The other counter is the line counter, which keeps track of the line number being processed. After each source line has been examined on the first pass, the location counter is incremented by the correct number of bytes.
When the assembler processes line 1 of the source, it cannot replace the symbols ZERO and OLDER by their addresses because those symbols have not yet been defined. This is called a forward reference problem.
The assembler will place the symbols into the symbol table, determine the number of bytes to advance by altering the contents of the location counter to 3, then proceed to process the next source line. After processing line 3 of the source, the current state will be,
Line Address Label Operation OPD1 OPD2 1 0 COPY ZERO OLDER 2 3 COPY ONE OLD 3 6 READ LIMIT
and the contents of the symbol table will be
Symbol Address ZERO --- OLDER --- ONE --- OLD --- LIMIT --- Location Counter: 8 Line Counter: 4
The symbol table currently holds five symbols, none of which yet has an address. During processing of line 4, the assembler picks up the symbol OLD. It establishes that it is already in the symbol table, so does not enter it again.
During line 5, the assembler encounters FRONT, and it is entered into the symbol table. The assembler also knows its address (10), so it is also placed into the table. After processing line 9 of the program, the current state is,
Line Address Label Operation OPD1 OPD2 1 0 COPY ZERO OLDER 2 3 COPY ONE OLD 3 6 READ LIMIT 4 8 WRITE OLD 5 10 FRONT LOAD OLDER 6 12 ADD OLD 7 14 STORE NEW 8 16 SUB LIMIT 9 18 BRPOS FINAL
and the contents of the symbol table will be
Symbol Address ZERO --- OLDER --- ONE --- OLD --- LIMIT --- FRONT 10 NEW --- FINAL --- Location Counter: 20 Line Counter: 10
The first pass continues, building up the symbol table. When the assembler determines the address of the various symbols in lines 16 to 21, these are entered into the table. At the end of pass 1, the symbol table should list all declared symbols as well as their addresses.
The state at the end of the first pass is,
Line Address Label Operation OPD1 OPD2 1 0 COPY ZERO OLDER 2 3 COPY ONE OLD 3 6 READ LIMIT 4 8 WRITE OLD 5 10 FRONT LOAD OLDER 6 12 ADD OLD 7 14 STORE NEW 8 16 SUB LIMIT 9 18 BRPOS FINAL 10 20 WRITE NEW 11 22 COPY OLD OLDER 12 25 COPY NEW OLD 13 28 BR FRONT 14 30 FINAL WRITE LIMIT 15 32 STOP 16 33 ZERO CONST 0 17 34 ONE CONST 1 18 35 OLDER SPACE 19 36 OLD SPACE 20 37 NEW SPACE 21 38 LIMIT SPACE
and the contents of the symbol table will be
Symbol Address ZERO 33 OLDER 35 ONE 34 OLD 36 LIMIT 38 FRONT 10 NEW 37 FINAL 30 Location Counter: 39 Line Counter: 22
Code generation is performed on the second pass. Before starting, the line and location counters will be reset to 1 and 0 respectively. The assembler now generates one line of object code for each source line. Line one is translated to
Address Length Opcode OPD1 OPD2 00 3 13 33 35
Successive lines are translated in the same manner. On encountering the label FRONT in line 5, the assembler ignores it. Lines 16 to 21, where space is reserved for variables, the assembler may leave these undefined, or initialize them to zero. The object code generated by the second pass will be,
Address Length Opcode OPD1 OPD2 00 3 13 33 35 03 3 13 34 36 06 2 12 38 08 2 08 36 10 2 03 35 12 2 02 36 14 2 07 37 16 2 06 38 18 2 01 30 20 2 08 37 22 3 13 36 35 25 3 13 37 36 28 2 00 10 30 2 08 38 32 1 11 33 1 00 34 1 01 35 1 xx 36 1 xx 37 1 xx 38 1 xx